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NEW DIRECTIONS IN 
EVALUATION RESEARCH: 

IMPLICATIONS FOR VOCATIONAL EDUCATION 



I would like to talk about the values underlyirtg educational evaluation. I don't want to talk 
about them theoretically, but ra^er from what f thirjk is a very clear-cut point of view. I believe 
that values are inherent in what people do, not Just in what they* say. If we study the actons we're 
involved m we can find out something about the valyes of evaluatimi. Clearly, evaluation activity 
IS niade up of roles thatdifferent people play, wsmetimesat different times. I think the tasks of 
-Bvaniatton vary sigmTTcantTy d^endingon the kind of program we happen to oe s^dying. So I think 
the Idea of a full and perfect generalizafale methodology is unfair if not impossible. 

If you look at the work that evaluation people have done over the last fifteen years, you can 
see qualitative changes in wrhat was expected, what was promised, and what was delivered. It's very 
enlightening to go back to some earlier evaluation reports which were regarded as satisfactory 
documents and compare them with the kinds of things that are coming out now. To put all this in 
context, 1 think it's important to understand a little about the history of evaluation - principally 
evaluation as seen by researchers or research professionals, since most of you fit that description. 

When evaluation activity became highly visible {at least ten years ago), it appealed to values 
that were held by many educators. The skills in research that we had acquired were modeled more 
from the science end of education rather than the art end, and we thou^t those skills could be 
applied in a fashion that would lead us to the improvement of opportunities and desired outcomes 
for students. So it seemed po^ible to fuse several elements - first, a respect for rationality which 
as researchers we presumably had; second, the power to Implement rational procedure, which as 
researchers we probably didn't have, at least in a political sense; third, an altruistic objective, which 
we saw as improved instnictional life for students; last {and not incidentally for some), a supple- 
mentary source of livelihood, because the short, happy life of well-supported research activity 
was over. . 

It seems to me that the short life of research activity - of the cooperative research program - 
was important in two re^ts. First of all, the brevity of that period bred intensity and commit- 
ment People developed their research skills. There were research training programs in the middle 
sixties that were funded out of Title IV, the same enabling legislation that generated the labs and 
center program. Much research was being done. As the research opportunities lessened, the 
researchers became evaluators. an interesting job change. People assumed they could apply their 
research rationality to evaluation problems. This view Implied optimism. It was asaimed that 
evaluation shared the basic precepts of a good science - that qyaluation was Independent the way 
research is jupposad to be independent; that evaluation was orderiy; that expertise was required 
for its conduct; and that, by virtue of training, some people ought to be better at it than others. 
Sufi^orting beliefs honored the value of measurement, attractability, discovery of causal relation- 
ships, suspension of disbelief for questionable human data sources and, of course, the idea of design 
jnd controLAU these research ideas were transferred to the evaluation framework 



Evaluation also pi^eeded in an atmosjohere of mutual &ipport Optimism was based not only 
on the idea that we could identify a treatment or determine whether or not a program was any 
good, but evaluation was also to contribute to the productivity of such programs. In general, these 
beliefs were shared by program developers and managers in evaluation/ Government people, those 
in fact responsible for contract management, were generally less well trained and were usually 
inexperienced as well, so they were moderately comfortable with a weli-argued plan of action that 
promised substantial benefits. 

From an instructional point of view, sometimes these premises were valid. We would try out 
some of the theories - for example, the work that Lumsdaine and Giaser did on deciding how 
instructional programs ought to be developed and tested. Pre-tests were developed, instructional 
treatments were implemented, data were collected, and revision cycles recurred. We had the idea of 
"Social Darwinism" - that in some sense, we were getting better and better each time as we collected 
data. In fact, there was some research evidence to support this. This is the kind of model through 
which i gqt into evaluation. It's a time-seriK design; we collected data and tried to make things 
better. 

Obviously those of you who have some background in systems analysis recognize that the kind 
of work we were doing dealt essentially with closed systems where there wasn't a whole lot of 
uncertainty. Program developers at that time, and I'm now talking about 1970-1971, were still in 
relatively good control of the population of learners that they were dealing with. They could exclude 
or include people by pfe-test, and they were able to control pretty much what happened instmc- 
tionally in the treatment. As i said earlier, sometimes the desired learning did occur; and at that 
point we made inferences about how good a program v. js by looking at student performance on 
measurement instruments. Many of you can recall the evan^iicai fervor with which some people 
pushed instructional objectives. (I was probably in that set.) The concept of learning through 
instructional objectives was based on the idea that student performance ought to follow from the 
instruction that is pr^ented to them. Very often, however, we didn't ask anything about the 
students except the perfunctory, "How did you like the program?" We weren't centrally interested 
in the long-range effects of the program or in any serious attitude change. At most, we were 
interested in instrumental information that would allow us to make the program better next rime. 

There were, however, "voices from the wilderness" land many of us thought they should 
remain there) that objected to the overall strategy. (At this time I was at the Southwest Regional 
Lab working on the development of their reading program, and 1 was into the cycle in 3 big way.) 
Critics of what we were doing identified the "top down" nature of instruction?! development and 
noted that many personal decision rights were preempted by the developer and the evaluator and 
taken away from the students. Claims were also made that our outcomes measures were, in a/iy case, 
incomplete, and probably inaccurate a: well. Other considerations were voiced but summarily dis- 
missed in large part, I would guess, for the wrong reasons. One reason these criticisms seemed so 
easy to dismiss is that simfiar kinds of criticisms were coming from people who were self-avowed 
"protectors of humanism/" Some of you may recall that the neo-hnmanistic movement caused a 
great deal of controversyf The people involved in this movement were anti- technology, anti- 
schooling, and frequently associated with encounter groups and Esalen-type experiences. They were 
"typical Californians," I might say - certainly a group that's easy to discredit. Another reason for 
overlooking the criticism, which I must admit with some embarrassment, was my own personal 
reason: the criticism seemed to attack my sense of personal accomplishment. I couldn't help but 
think, "By damn, I know I'm doing something. Why are they putting me down?" In this closed 
system kind of evaluation, we thought that instructional strategies were, very loosely, perfectable. 
We would also try lo improve our procedures, and we thought these, too, were improvable. 



The major conceptual dlstinctlcm between evaluation and our former line of work, research, 
was in the treatment of generalizations. At the heait of most scientific research was the effort to 
find new knowledge and generalize it; at #ie heart of most instructional development experience 
was the effort to find information about a particular program and generalize It to other similar 
population groups. It seems to me that some of us searched in vain for regularities which might 
allow us to consolidate and Improve our methods, and I did a lot of writing about that at one point 
But such generalizations about procedures would be frosting on the cake; I might add that it would 
be pink fondant roses if we found R&D procedures of general use. But there was a sort of reluc- 
tance or inability at that point to formulate those ideas in a way that could be shared. 

Now to show you something of the transition between what evaluation w» like up to gboi.t 
1973 ano what evaluation is like now, I am going to describe a little of it, six years later. Whether 
the change came about because w§ have a different frame of mind or because there are broad, 
socially-inspired shifts that have occurred in the meantime (ir>cluding the lack of faith which 

President Carter's jpeech identifie d), nonetii eleys. thefe hff»y h«pn ^ chany, I think pnrr of thii wat 

aipported by tfie re-analysis of the studies of schooling done by Coleman and orhers whic*^ tended 
to point out that most of what we did was futile at the nwrgin, and there weren't any changes 
taking place that couldn't be explained better by demographic information. We had a spate of NSD 
("no significant difference") research, even though in our own minds we would imagine that there 
were very large treatment differences. The level of resources had changed dramatically, but we just 
weren't finding any of those changes reflected in measurement At t*ie same time, federal research 
and development activity was severely constrained, partly because c f overall shifts in the government 
and partly, I think, because, of the ineptness of some of the appeals that went on at that time. I 
would urge you to read the book by Sproull, Weiner, and Wolf called. Organizing an Anarchy, 
which is pertinent to this point 

At any rate, there were also different views on resource allocation at the legislative level, and 
ideas such as zero-based budgeting required presumably tougher tests to be applied to projgrams 
than we had before. At the same time, concern for educational equity, led by the courts, generated 
a set of programs which might in fact be legitimate just by their very existence rather than by their 
effects. The issue was equity of opportunity; not equity of outcome. Programs for bilingual children, 
for example, could survive perhaps very negative evaluations because there was clearly something 
that needed to be done by the govemment in that sector. Educational programs became, in some 
situations, vehicles through which to reallocate resources rather than real treatments. This further 
shifted the operating focus from outeomes to procedures. The educational systems that were now 
addressed were open instead of closed with regard to the nature of the programs underteken. Many 
more local options were provided. The nature of the participation of students was more open, since 
it was very difficult to restrict or exclude students on the basis of not meeting the entry level 
criteria set up by the program. Students were willing or unwilling particle ants depending on where 
they were in the transiency or mobility bands around the school. The final point is that the basic 
stability required for the identification and the evaluation of educational ac:ivity just wasn't there, 
especially in a longitudinal framework. If a transiency rate is 20 percent annually, over three years' 
the turnover is substantial. It's very difficult to do a longitudinal study now and have the idea that 
one is dealing at all with the same cohort 

So apparently a new set of values has become adopted by people who do evaluation in more 
open sys^m evaluation settings, and I ^ink there are some very significant differences which have 
implications for how we act One difference is ^at there is a new emphasis on pluralism, on 
diversity, and on multiple objectives. Also, the selection of what will be incorporated Into the 
program is left wide open and very often aibject to local preference, since it is argued that the 



findings of educational research have failed to give priority to alternative courses of action anyway. 
The resultantjnix of activity^irLSchflols can hardly be called a program at^L (Ijnay exclude - . 
vocational education programs if you have better information. I'm speaking primarily about what 
i know goes on in the general education programs in the public schools.) The notions of treatment 
and the attendant links with causality are concepts that we can hardly deal with anymore. Refine^ 
ment of programs at the level of precision which char^terized many of the curriculum development 
efforts of the early labs and centers is really beyond comprehension in most of these programs now, 
especially Jn urban settings. This is difficult for some to comprehend because much of our language 
has stayed the same. We still talk about the programs as if they were the same entities we had five 
or six years ago that you could hold onto in some ways, describe, and manipulate. 

At the same time, evaluation roles fiave changed, too. Evaluators in the sixties, whether they 
were looking at instructional units or broader-based policy efforts, were very much committed to 
methodology and to the provision of clear information — to the use of a science base. The initial 
response ot^atoators to these ctrarigfng^ of Open systemrah^mdfe diffuse vievvsiaf 

education was, as ycHi might imagine, to search for better rneasures and better techniques — that is, 
to look first to methodology as a way of solving their problem. So some of us began looking for 
better ways of aggregating information, refining our designs, refining alternative methodologies, and 
conducting comparative studies. There were also other titles taken during thii transition, Por 
instance, the question was raised about the utility of differ^^nt iypes of data and trade-offs for data 
reliability. We could achieve a sense of consistency by viewing data across sites, so we made greater 
use of case study versus survey kinds of research. Preferences developed among some evaluators for 
looser, more interactive designs, and what I would call "soft" data sets; and sides vwre chosen — 
"hard" opposed to "soft" - although maybe the^ alignments aren't necessary and are probably 
dysfunctional. Radical approaches - ideas Bob Stake talk^ about in 1972, or Bob Rippey outlined 
In 1973, and Egon Guba proposed in 1978 - cast the evaluator more as a responsive inquirer than* 
as a provider of purely objective views. Critics claimed this new responsiveness was only labeling 
and legitimizing what was the case anyhow. They argued that we were already biased, so why didn't 
we just name our biases? That is, they saw the evaluator entering with screens through which the 
data and perceptions p^s. Somehow we assumed that this screening would "randomize out" through 
use of a great number of case studies. The participant evaluator role was also conceived as a foil to 
the role of summative evaluator with the latter, summative evaluation, being assumed to adhere 
strictly to comparison and choice among program options. 

Other questions were raised, of course, about the objectivity of evaluation methodology. 
Henry Aaron, frorn th? Department of Health, Education, and Welfare, pointed out in one of his 
speeches that evaluation methodology is inherently conservative. K gives a tough test of differences 
because that's the way tha statistical paradigm is structured. In other words, the structure of evalu- 
ation methodology is prone to give a finding of "no significant differences" when, in fact, there may 
be differences which are ignored or overlooked. So there is and will continue to be considerable 
debate about the best roles evaluators should take, and the type of data most useful, and so on. It 
seems to me that these changes were gradual. I'm making them seem more dramatic because I'm 
looking at them in retrospect. Thesa changes also seem to be characteristic of changes in other 
disciplines. 

At this point I would like to discuss the evaluation community's reaction to the specter of 
politics, a factor which grew in our awareness ea:h year. By politics I'm talking not only about 
major legislation and national politics, but politics at every level It seems to me that when the 
discussion and execution of evaluation entered the area of politics, the first set of responses we 
made as evaluators was wholly predictable* We thought we should implement old and important 



values in an effort to control the situation tince, as avaluators and researchers, we were Interested 
_ jn control. We responded to the politico of the situation somewhat like apararnecium that ingests 
by envelopment - we ^led to confront politics by surrounding and absorbing it We thought we 
could gain control of what we saw as a political Incursion into our area. Our first thought was, 
"Why are they messing with us?" We also thought we could manage what we saw as the "irrational" 
side. What we wanted to do was to bring that whole set of experiences within the boundaries of 
"mainline" educational R&D. 

Let me give you one or two examples: To deal with the conflicting goals, a political problem . 
which was being loudly articulated in ^e very early 1970s by various constiuiencies with interests 
in Khool programs, evaluators borrowed the needs a^essment idea from the community develop- 
ment people in sociology. So the evaluation profession's response to politics at that time could be 
characterized by such statements as "We can soh^e that problem by giving everybody an opportunity 
to say what they want to say in a needs assessment and that will keep it nice and neat" We 
duveluped systeiriatic piuuudu res;^ The Idea wa» Uial we wete going to ^ ^ a ve an ttxpreswun uf pl mal- — 
istic views In an attempt to control the "politicizing" of evaluation findings, v^lch nr«eant, in my 
view, that most evaluation work was being used only » a device to argue for or against favorite 
programs. Evaluators developed and promoted adversary or advocacy forms of evaluation where 
pros and cons were pitted against each other. I'm sure you know the work of Bob Woff, Marilyr, 
Kouriisky, and a number of other people who have tried this kind of technique. In the market 
research area, we borrowed the idea that we had to write reports - different reports for different 
constituencies — because it was Important to keep building our connections with those constitu- 
encies. So we started generating more paper. 

What those Illustrations show is that people in evaluation tried to transform existing societel 
and political reality into procedures that could be controlled, and we have perpetuate some 
anomalies on the body of education by doing that We have ccHnmunity advisory committees 
providing input and we do needs ^sessments - some Khool districts do thirteen of them a year, 
one for each special program they are involved in. in California, we have school site councils which 
are suppcraed to provide continuing input into aa^ment of local programs. We allow the artlcuia- 
tion of re^s of precise objectives. When i w^ here six years ago, you had Selves of vocational 
education objectives; we at the California R&D center had our 578,000 reading objectives, and the 
Wisconsin R&D centers had theirs. Writing obj^tives was our way of trying to account for plural- 
istic views. We reasoned that people would think of vocational education, (»- reading, or some other 
area as consisting of alternative views, and the only way we could articulate those views was by 
writing objectives. Now let's obsen/e an irKiivldual case. Suppose you have an evaluator who is 
trained in educational research. That evaluator confronts a program which values diversity over 
performance, which values the distribution J program over treatment and which values activity 
over outeome. What this evaluator tries to 's to make some sense out of all that and he or she 
finds, from the point of view of the educational researcher, ^at this is an impossible task. 

I fmally discovered a book by Aaron Witdavsky, a political scientist who makes the point that 
politk» and planning (planning is his word for any systematic intellectual activity) are equally 
rational. It isn't that the researchers are rational and the politicians are irrational, but that the 
norms of planning or evaluation contrast with the noniis of politics. He points out that the norms 
of planning (i.e., evaluation) are methodological norms and exist without content And the norms 
of pol1ti<» have oniy content that Is, what you try to do In a political situation is to agree upon 
something In the content area. We have, as evaluators, mounted some very interesting altepiatlve 
responses to the political view of evaluation. Some people in ^e field argue that the uses of evalu- 
ation are primarily pclltical; and we should, therefore, put politics first and use evaluation as 
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persuasion. They say that our primav effort should be to convince people of the vaiue of certain 
educational programs if, indeed, we believe that those programs have value. Other people have gone 
" Into other tines of work, back to the luxury of smaH-scatelaboratorv experimentation where a 
referee is required only at the journal level. Many have persisted, and I feel I belong in this group; 
but I may be moving into the first one soon. 

Wildavsky makes some claims which may or may not be true, but he says that the social policy 
researchers did not have to experience the same shock of context and change which people in 
education experienced. Education has always had a political side, but the researchers have generally 
been Insulated from It. Only recently has this changed. The social policy people do not normally 
assume to control outcomes (with the exception of Henry Kissinger). Policy people are different 
from educators because thev have not very often had that heady, indelible experience of seeing 
performance levels for kids change from 60 percent to 90 percent because of something they did. 
There are some differences in our background, too. Policy analysts are trained in the notion that 
-politics is a reality frora start to finish and not something transmuted into life jiom. theiroiitp^ge?. 
of the morning newspaper. Some of the policy analysts even appear to think that politics is fun, 
not an Incursion, and that is a concept which is hard for me to cope with. In contrast to the educa- 
tional researcher's response, their point is not to get control of politics, but to let it happen. Their 
goal is not to quiet it down, not to strip the work of biases, not to make goals and findings crystal 
clear and able to stand on their own merits - all the things educational researchers ta'k about doing. 
Instead, policy analysts acknowledge and embrace political reality. They make decisions in terms 
of how they will "play to appropriate audiences." Ambiguity, which they see as allowing both 
personal and satisfying interpretation, is not always a fault; sometimes they push it. Goals are 
multiple, outcomes are blurred, and people can feel that their own priorities are taken into account. 
It seems to me that these policy analysts have taken a particular point of view - a view which you 
may have some moral problems with, but a view, nonetheless - of how one merges scientific or 
rational planning with the political reality. 

It's Interesting to me that there are methodoiogists in our field who have been able to do this 
themselves. One of those people is Cronbach at Stanford. In his most recent monographs he's been 
writing mostly about the political problems as well as the methodological differences that evaluators 
have to deal with. He seems to be a person who can transcend some of those issues. There are a 
number of people who can't do that, and these people have done some really strange things. For 
example, there are those in evaluation who make political connections. Such a person might form a 
connection with a certain politician and the two together form a kind of hybrid person. One person 
can't deal with both politics and rationality, so the two form a team. They're like Siamese twins. 
Other people have decided that they will "sit at the foot of" the contracting agency and try to 
provide information. 

We have an interesting problem with our contracting agencies, and I'm dealing not only with 
NIE, although they're my favorite agency. First, at the agency level, there is an erosion of belief 
In the expertise of the evaluator to make unchallengeable technical decisions. The "hired gun" 
strategy which Mike Patton talked about in his book used to be a device which demonstrated the 
proposition that evaluation people, like education professionals in general, love to disagree on both 
major and minor points. The fact that we do this so often and so publicly makes the credibility of 
anything we do subject to attack, and everyone knows that any study can be ripped apart depending 
on the point of view of the critic. Any evaluation is subject to some technical disagreement, and 
this erodes credibility. Credibility is one of the most important coins that a person has in a political 
context, so that puts us one down. Second, there is a realization that evaluation offers a terrific 
means for attacking individuals who seem to be above or Insulated from more typical approaches 
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at dlicredit Let me expand on this a little. We have aoclety to thank or blame for the fact that 
our usual modet of discrediting people aren't working anymore. Politicians used to be discredited 
on the basis of their marital status, sexual behaviors, jsubstance abuse, or fraud. But these seem not 
to have the cogency that they had In tfie past. In fact, a clew- apology seems ail that's necessary 
even In the face of Indfctment You can't get rid of somebody as enlly as you couid In the past 
Evaluation findings now can be used as a supplement to discredit public officials whose rhetorical 
claims, as usual, outstrip their programs. So evaluation has become an Important political tool. 

I'll give you a short example. Jerry Brown, the governor of Cafifomla, had a pretty good idea 
about Imkinr rhe university systems and the public school systems by way of a stationary satellite 
h wasn't going to cost that much and there were all kinds of reasons to do it The problem was that 
those who ocposed the program linked it to what they considered some of his "wilder" ideas. They 
used tf. ls particular satellite program to discredit him and his political ideas In general, and it marked 
the begmning of the continuing problem he's been having with the state legislature in California 
That's an example of how an evaluation activity can really discredit a person. The intent here wis 
to discredit Since Jen*y Brown Is polltteally astute, he got by. 

These developments have had several results. The first Is the belief among our contracting 
agencies that technical dispute equals arbitrariness, so they came to believe that any method is a 
good method because all methods are likely to generate complaints. Second,, it has become evident 
that evaluation results can depose power; and thirdrthe successes of these combined political 
efforte have given the politicians and bureaucrats a sense of persona! power. They now "under- 
stand" the heretofore arcane procedures of research and evaluation. This is a very scary phenomenon. 

Let me give you a couple of ways that you see It exhibit 3d. One is in the quality of the RFPs 
that you are receiving. If your sentiments are the same as ours, those RFPs are prescriptive to the 
point of be?ng nonsensical; that is, the sponsoring agencies are now telling us what sample proce- 
dures lO use and what phases to go through. It used to be that "they bou^t your brain; now it's just 
your arms and legs. What they really want is research assistants - robots to do their work put in the 
field and then to provide them with results which they can decide how to "patty cake" into good 
shape. That is one serious way of looking at the notion that tfie contract agem:ies know better than 
researchers and evaiuators how to do educational research, because obviously everything is subject 
to dispute. Sponsoring agencies think they know everything becmise of the educators' discussions 
with them. Here Is a second example, which Is a really wonderful example In some ways. We were 
working on a proposal last summer for the state of California - a million dollar contract About 
twenty days into the response period, we got an amendment by telegram advising us thaf the 
California state legislature had mandated control groups. You have to think about that Here is a 
political body voting on whether you should have control groups in a research study. That's scary 
in some ways If you think about Independence, lack of bias, the quality of the kind of intellectual 
rationality you want to bring to bear on educational research. 

What do I expect to happen with all of thij^ I'll tell you, and most of this is taken from Aaron 
Wlldavsky In a book which I recommend to everybody t^cause besides being informative, it's fun 
to read. It's called Speaking Truth to Power, which I think is a great title. He has a ch^ter called 
"Strategic Retreat on Objectives; Learning From Failure in American Public Polwy." What he says 
Is that public policy is in a similar state acro« many social/policy areas, not just education. Our 
early optimism, in which we were going to change the outcomes significantly and solve all the 
^ social problems, just didn't pan out; and everybody Is feeling upset about tnls. But he feels that 
what Is happening now is a translation to concern not with outcomes but with process. The evalu- ' 
ation literature today is greatly concerned with implementation evaluation, and that si nply means 
making sure that the process happened. A type of study illustrating this is Milbrey McLaughlin and 
Paul Burman's Rand studies on innovation. 
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Wildavsky sett this ai t stratsgie ratrtat from objactiva evaluation. Ha thinks what's going to 
happen next is that progr«ns are going to be legislallveiy formutated exclusively in terms of the 
services they provide, btkI the rhetoric about what they're supposed to do is going to be dropper!. 
That has a lot of implications for {Mopla in evaluation. 

What don all this mean? I think it means that evaluation problems mi^t be easier if we could 
come to some kind of agreement jmong the evaiuators, the program managers, and the contracting 
agencies. If we OHild agree that activity is a legitimate way to assess the issue of innovation in 
educational settings, then the effect of this realignment of value perspectives may work to permit 
evaiuators and ^elr work once a^\n to hold some misted status. 



8 



QUESTIONS AND ANSWERS 



Qutttion: Would you please comment-on what you believe will be the direction of educational 
evaluation over the next ten years? 

teU you wffaTr^Ike to say. I'm a~bel lever In outcomM.Tthlnk that a loFoflhe cnttoTor 
quantitative-oriented evaluati(Hi sidles are absolutely correct, ^at the bfais on which jad(Hnmts 
were made and no significant differences were found was mostly attrlbutabte to poor dep«id9nt 
. measures - that is, bad post-telts. I would tdce the view that the measurement process. If it's going 
to be treated seriously in these evaluations and not swamped by this emphasis on servige,^ going 
to have to be made a lot more credible. 

^ One way of making the measurement process credible is to m^e it' publicly accessible. Up to 
this point certain people have gotten away with tfie4dea that they were arcane elves in Princtton 
making up instruments that everybody could exclaim over, saying, "Yes, If you say this measures 
' my academic achievement, it doesl" if you lo^ at the legislative trends tomd public access to 
tests in Texas, New York, and Massachusetts (these were introduced nationally a couple of times), 
there's a concern that the public needs to have acc^ to the» tests, particuiariy for a constitutional 
purpose, a due process purpose. It's only If the^students and teachers know what's going to be on a 
test that the proper senrices can be legally assured. That's the basis on whicfi the Florld^case waiT 
argued. So our interest in this is to get hold of the meaRjrementslde now, and to do Jm^the point 
where most people in school districts think evaluation exists. We l^med from a did that 

about 75 percent Qf ^e people in evaluation research units in school distrlctt thinrlRat evaluation 
is testing. They see those thirvgs as isomoiphk:. What we would like to do is to have the tests exhibit 
certain criteria. One of them is public access, and we think the way to do tfiat is thraigh specifica- 
tions - not through the annua! publication of all the tests. Zacharlas at MIT has cci'SHited the core 
computer space it would take to publinb ail of the test items annually. Our main concern is with 
public acxe». • 

Our sdcond concern is with money and conservation of resources. We're^doing a study on the 
NIE grant lat least we hope we're dting a study on the NIE grant) whk;h is a survey of all testing 
practices in publk: schools. Our understanding of4t now is that tests are being regarded as dysfunc- 
tional in the public schools. So now we're thinking in terms of public fc*^, the economy is^, 
the notion that tests need to relate to instructional programs very directly, and the idea that teachers 
«id kids have to regard tests as meaningful and important actlyities - that Is, as measures that have 
what we used to c^ll "face validity." This would be some way of grabbing hold of the evL..iatlon 
process, if that doesn't happen, what we have to start doing Is finding measures and indicators that 
have common sense appeal to people. You me, people in testing sometimes argue for tests on tha 
basis of their validity coefficients, \^k:h doesn't make.much sense to a lot of people. We have to 
produce meaningful tests that make sense to people. That's the only way we're going to be able to 
keep the hbok into outcomes that we need. That's over the short term. Over the long term every- 
thing will switch back, of course. 
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C^MStion: What aoout tfie role of evaluetion in terms of policy? 



. Well, I think it's absolutely important for us to try to maintain an independent stance in 
evaluation. What we tried in California was a terrible mistake. We were a state university evaluating 
a state egeixjy on which the superintendent of that agency was a member of the Board of Regents 
who also happened to be on the Advisory Board tc NIE. We were coming and goifig. VVhat you end 
up doing is getting everybody choosing up their favorite team and getting an evaluation together 
that says what the politicians want it to say. 

I 

In a political context, I'd like to speak atraut summative versus formative evaluation, if I can, 
in a longer perspective. And from that perspective, I think summative evaluation is a complete 
waste of time because nothing that's created ever really gets dropped; it just gets transformed or 
renamed. Summative evaluation is irrelevant except in little tiny program comparisons. But in large- 
scale Title I typ^ programs, for instance, it's only formative valuation that counts. It seems to me 
that the kinds of things that we should focus on are outcomes and processes that can be manipulated, 
that we can do something about We should not allow educators who would prefer to be mathe- 
maticians or psychologists, tfiose who got Into education just because they needed a job, to be the 
people who decide the important i'^es. Those people who are conducting evaluations should be 
forced to attend to outcomes and processes over which the schools and the educational professionals 
liava some input I think we've been wiped out mostly by data collected about information we have 
no contrcl over, nor will we ever have any control over it That's be^ done in the name of science 
and in the name of comprehensivene», which I know is a theme. I woqld push towards very little 
evaluation — certainly much le» evaluation thao we're doing now within each program, but evalua- 
tion c'- rgeted so that the information could have direct input to somebody eise. I wouldn't collect 
evaluation information for the sake of having a "pretty r^ort" Right now most of evaluation is ' 
done for the "pretty report" 



Question: What about the long-range evaluation of programs like Project Headstart? You could 
be expected to evaluate the ^ucational outcomes, but how would you evaluate the 
non-edacationat, for Instance, the health aspects of such a program? 

Well, I probably wouldn't; and I would probably question those data, i have some concern about 
the correlatioHjal value, how those kids were selected and whether they were likely to be healthier^ 
and so on. In general, i think that there's too much emphasis on evaluation right now. I think we 
should spend a lot less money on evaluation and a lot le& money on testing. 



C^estion: Why did you choose to quote Aaron Wlldavsky so extensively in relation to 
policy analysis? 

Why Aaron Wildavsky and not someone else? Essentially for reasons that probably have more to do 
with happenstance and not for the best reasons of all. Aaron came at my invitation to the H&6 
center as a visiting scholar. He conducted a seminar for members of the state legislature and their 
staff people on what research harf to say to policy. And why did I ask him to come as a visiting 
scholar? Because i asked the legislators who in the California university system they would most 
like to have come to speak io them. They requested Aaron Wildavsky. So first of all, I had a personal 
connection. Secondly, it seems to me that he talks in very practical terms about evaluation and 
policy. I regard myself as an activist It just seems to me that Wildavsky had an activist's approach. 
I know it's incremental, and I know what he advocates is different from other social policy 
approaches. But that was the basis - no lengthy library search. 
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Question: No one seems to agree about outcomes. Instead, they seem to agree on the process 
or activity. How, then, can you focus on outcomes? 

I suppose that my background shows, which is research on learning and r :ruction. I accept much 
of what you say. I've had quite a bit of experience with teachers who are dealing with some really 
basic arem like reading, writing, and math. We have a research project at the R & D center on writing 
assessment {t's remarkable how people in English departments can agree on what the criteria are 
for good writing. So I think it's possible to find places where people agree. What I think we should 
do is exploit those now. It may be that when we start getting into some general, conceptual matter, 
where we attempt to define it and assess a person's understanding of it in an X or Y manner, we'll ' 
end up with fragmentation. I am not an advocate of outcomes or objectives for every area of 
instruction or every kind of instmctionai program. In some cases, I think that the best we can do 
Is to show people that they have greater flexibility to react to a variety of indicators or outcomes 
than they had before and not even worry about what those are. But for the areas where we still 
haven't demonstrated to the satisfaction of the electorate and the public that we kf.ow something 
about ihem, like reacfing and writing instruction, I think we should get hold of those now and not 
just let them go. 

The interactions, if you want to think statistically for a minute, are really impressive. As a 
measure of instruction, let's take writing for an example. I don't know if this is the case in your 
state (I speak from our experience in California); but if you go into the high school classroom you 
wiil flnd that kids are not getting writing instmction. Kids are not being asked to write essays much. 
We did a study which showed that the average number of writing alignments for a tenth-grade class 
in composition was one a month, and the average length was one page. These are tenth-grade kids. 
When we asked the teachers, "Why aren't you asking these kids to write more?" the response was 
"It takes too long to score essays." We have a measurement or performance issue involved; that is, 
what counts as adequate feedback, and how can you provide it to the kids?. in another study we did, 
data came back in a way that showed us some of the kids didn't do well. When we followed up on 
that, a sizable percentage of the te«:hers said they didn't re-teK:h a tqaic if the students didn't do 
well; they just dropped it. That response has impltcattons for methodology and for outcomes. 
Otherwise you may be going through some mindless activity. 



I also understand the weakness of the causal chains on what one does instructionaliy. What 
concerns me in this focus on process is that people can so easily get hold of the wrong processes. 
One of the requirements for a certain program is individualized instruction. This is a nice catchword, 
and everybody can Interpret it differently. Through the guidelines it was Interpreted to mean th&t ' 
you had to have an individual progress record on each kid. So schools set up "war rooms" with 
project and status charts. That's okay for projects, but for individual kfds it's sort of scary to see 
that kind of thing up on the bulletin board - reports on each child by subject matter and perfor- 
mance level. 

I made some tongue-in-cheek remarks about naturalistic data techniques, but that's exactly 
what i suggest we do in our evaluation studies: use mixed models, use highly quantitative approaches, 
and explore ways to develop verifications or hypotheses. We use both outcomes and processes. I 
think that's "^e only thing anyone can do right now. Our real conc^srn is not making the evaluation 
load too onerous to the data provider. For instafv:e, we've been concentrating very strongly on 
sampling techniques so that aH the kids or a/I the teachers are not asked to provide information. 
I still think there's sufficient agreement on some points to allow us to push on outcomes a little 
bit, but not exclusively. You're right about that. 
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Question: Wcjld you comment more fuily on the role of policy analysts? 



Well, it may turn out that evaluators in future will be differentiated in terms of the ones who 
move in the general direction of policy analysis, as w^posed to those who are more technology 
oriented - those who do the acuial design collection and preliminary interpretation of information. 
We know that right now the policy p&^\e often cast their quKtions in ways that are unanswerable 
by data. That may be deliberate. In the cases where it isn't, I think there have been a few instances 
vv^ere policy analysts and evaluators have gotten together on a problem, like an RFP provision, and 
tried to work out what an evaluation study should have in it. It isn't clear to me how they relate, 
f wish i could have a pat answer for you. I think policy analysis is something that educational people 
haven't noticed much before; they assumed that policy analysis work resided at Harvard, Michigan, 
and Rand, and that was it I think that as we understand more about it, I think we'll be able to see 
some changes. 
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